Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP [dbnode] Refactor posting list operations to be lazily evaluated and zero copy #2791

Open
wants to merge 119 commits into
base: master
Choose a base branch
from

Conversation

robskillington
Copy link
Collaborator

@robskillington robskillington commented Oct 22, 2020

What this PR does / why we need it:

This reduces a significant amount of all index query allocations by implementing the conjunction search as a lazy iterator over existing postings lists. It also caches regexp compilations with an LRU cache and allows setting different sizes for the regexp cache.

This change completely re-implements roaring bitmaps for query execution, this means pilosa roaring bitmaps are only used and index segment build time and all queries execute new read only roaring bitmaps which are completely mmap backed without any bitmap container/other allocations involved.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

Copy link
Collaborator

@arnikola arnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w nits

src/m3ninx/search/searcher/regexp.go Outdated Show resolved Hide resolved
src/m3ninx/search/searcher/term.go Outdated Show resolved Hide resolved
src/m3ninx/postings/types.go Show resolved Hide resolved
src/m3ninx/search/searcher/all.go Outdated Show resolved Hide resolved
src/m3ninx/search/searcher/lazy_postings_list.go Outdated Show resolved Hide resolved
src/m3ninx/search/searcher/lazy_postings_list.go Outdated Show resolved Hide resolved
Comment on lines 49 to 51
var (
union = make([]postings.List, 0, len(s.searchers))
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't need var block here

src/m3ninx/search/searcher/conjunction.go Outdated Show resolved Hide resolved
src/m3ninx/search/searcher/disjunction.go Outdated Show resolved Hide resolved
src/cmd/services/m3dbnode/config/cache.go Outdated Show resolved Hide resolved
@arnikola arnikola self-assigned this Oct 26, 2020
Copy link
Collaborator

@arnikola arnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a cursory glance over but didn't go super deep on it, figuring a lot of it was very similar to Pilosa's implementation; if this is a bad assumption can go back with more discerning review

"github.com/uber-go/tally"
)

// LatencyBuckets are a set of latency buckets useful for measuring things.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit measuring latencies

src/cmd/services/m3dbnode/config/cache.go Show resolved Hide resolved
src/cmd/services/m3coordinator/ingest/metrics.go Outdated Show resolved Hide resolved
src/cmd/services/m3coordinator/ingest/metrics.go Outdated Show resolved Hide resolved
src/cmd/services/m3coordinator/ingest/metrics_test.go Outdated Show resolved Hide resolved
func (i *multiBitmap) IsEmpty() bool {
iter := i.Iterator()
hasAny := iter.Next()
_ = iter.Err()
Copy link
Collaborator

@arnikola arnikola Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, should this return false if iter.Err() is non-nil? Otherwise no need to call Err

Comment on lines 253 to 259
type containerIterator interface {
NextContainer() bool
ContainerKey() uint64
ContainerUnion(ctx containerOpContext, target *bitmapContainer)
ContainerIntersect(ctx containerOpContext, target *bitmapContainer)
ContainerNegate(ctx containerOpContext, target *bitmapContainer)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do these need to be exported? If so, may need comments (and may be worth exporting containerIterator?)

i.bitmap.Reset(true)

intersects := i.filter(i.multiContainerIter.containerIters, multiContainerOpIntersect)
negates := i.filter(i.multiContainerIter.containerIters, multiContainerOpNegate)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this overwrite the i.filtered in intersects?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, TY!

func (i *multiBitmapContainersIterator) NextContainer() bool {
if len(i.iters) != 0 {
// Exhausted.
return true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be

if len(i.iters) == 0 {
  return false
}


// Tell any other callers that we're done loading
if entry.loadingCh != nil {
close(entry.loadingCh)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice pattern 👍

}
// Only valid if all intersecting actually matched,
// if zero intersecting then postings does not contain ID.
return len(i.intersect) > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; can move this to 181, before checking intersectNegate lists

case elem.multiBitmap != nil:
it = elem.multiBitmap.containerIterator()

case elem.bitmap != nil:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: needs to fail if both set?

@robskillington robskillington changed the title [dbnode] Reduce conjunction search and regexp compilation index query allocations [dbnode] Refactor posting list operations to become zero copy and lazily evaluated Nov 1, 2020
@robskillington robskillington changed the title [dbnode] Refactor posting list operations to become zero copy and lazily evaluated [dbnode] Refactor posting list operations to be lazily evaluated and zero copy Nov 1, 2020

require.False(t, first.Equal(second))
}
// func TestRoaringPostingsListEqualWithOtherNonRoaring(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; delete?

@@ -39,16 +39,17 @@ import (

func TestConcurrentQueries(t *testing.T) {
parameters := gopter.DefaultTestParameters()
seed := time.Now().UnixNano()
parameters.MinSuccessfulTests = 100
// seed := time.Now().UnixNano()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; can probably resume using rng seed

Comment on lines +50 to +52
// simpleReader, err := simpleSeg.Reader()
// require.NoError(t, err)
// simpleExec := executor.NewExecutor([]index.Reader{simpleReader})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; delete

require.NoError(t, err)
matchedDocs, err := collectDocs(dOrg)
require.NoError(t, err)
require.NoError(t, err, fmt.Sprintf("query: %v\n", q.String()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; require.NoError(t, err, q.String()) may be easier?

if okA && okB && countA != countB {
return false
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using countfast here, we should also make sure that

if !okA && !okB { 
	return EqualIterator(a.Iterator(), b.Iterator())
}

return false

intersects []postings.List,
negates []postings.List,
) (postings.List, error) {
intersect := make([]readOnlyIterable, 0, len(intersects))
Copy link
Collaborator

@arnikola arnikola Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prealloc a slice of iterables := make([]readOnlyIterable, 0, len(intersects) + len(negates) and have both intersect and negate be taken from that, e.g. intersect, negate := iterables[:len(intersects)], iterables[len(intersects):]

Comment on lines +152 to +163
Contains(id postings.ID) bool
ContainerIterator() containerIterator
}

type containerIterator interface {
NextContainer() bool
ContainerKey() uint64
ContainerUnion(ctx containerOpContext, target *bitmapContainer)
ContainerIntersect(ctx containerOpContext, target *bitmapContainer)
ContainerNegate(ctx containerOpContext, target *bitmapContainer)
Err() error
Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; comments here

func (i *multiBitmap) IsEmpty() bool {
iter := i.Iterator()
hasAny := iter.Next()
_ = iter.Err()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these need to check err?

…terministic query key, also returned optimized form
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants